A Grammar-informed Corpus-based Sentence Database for Linguistic and Computational Studies
نویسندگان
چکیده
We adopt the corpus-informed approach to example sentence selections for the construction of a reference grammar. In the process, a database containing sentences that are carefully selected by linguistic experts including the full range of linguistic facts covered in an authoritative Chinese Reference Grammar is constructed and structured according to the reference grammar. A search engine system is developed to facilitate the process of finding the most typical examples the users need to study a linguistic problem or prove their hypotheses. The database can also be used as a training corpus by computational linguists to train models for Chinese word segmentation, POS tagging and sentence parsing.
منابع مشابه
Studying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملAllophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملe-Learning for English based on Multimedia Database and Internet
In this time of Internet delivery, learning through Internet will be popular and enhance the efficiency of teaching. This paper presents an Internet-based distance learning system for English learning through multimedia database and Internet technologies, it is called "multimedia English corpus". It includes two major learning functions. One of them provides Articles, Dialogs, and Videos databa...
متن کاملCharles J. Fillmore
Charles J. Fillmore died at his home in San Francisco on February 13, 2014, of brain cancer. He was 84 years old. Fillmore was one of the world’s pre-eminent scholars of lexical meaning and its relationship with context, grammar, corpora, and computation, and his work had an enormous impact on computational linguistics. His early theoretical work in the 1960s, 1970s, and 1980s on case grammar a...
متن کامل